Lecture 2

Latent Gaussian Models and INLA

Sara Martino

Dept. of Mathematical Science, NTNU

Janine Illian

University of Glasgow

Main elements of the INLA methodology

  • Latent Gaussian Models
  • Sparse matrices
  • Laplace approximations
  • …other “technical” details for the special interested!

Latent Gaussian Models (repetition)

Repetition

Everything in INLA is based on so-called latent Gaussian models

  • A few hyperparameters \(\theta\sim\pi(\theta)\) control variances, range and so on
  • Given these hyperparameters we have an underlying Gaussian distribution \(\mathbf{u}|\theta\sim\mathcal{N}(\mathbf{0},\mathbf{Q}^{-1}(\theta))\) that we cannot directly observe
  • Instead we make indirect observations \(\mathbf{y}|\mathbf{u},\theta\sim\pi(\mathbf{y}|\mathbf{u},\theta)\) of the underlying latent Gaussian field

Repetition

Models of this kind: \[ \begin{aligned} \mathbf{y}|\mathbf{u},\theta &\sim \prod_i \pi(y_i|\eta_i,\theta)\\ \mathbf{\eta} & = A_1\mathbf{u}_1 + A_2\mathbf{u}_2+\dots + A_k\mathbf{u}_k\\ \mathbf{u},\theta &\sim \mathcal{N}(0,\mathbf{Q}(θ)^{−1})\\ \theta & \sim \pi(\theta) \end{aligned} \]

occurs in many, seemingly unrelated, statistical models.

Examples

  • Generalised linear (mixed) models
  • Stochastic volatility
  • Generalised additive (mixed) models
  • Measurement error models
  • Spline smoothing
  • Semiparametric regression
  • Space-varying (semiparametric) regression models
  • Disease mapping
  • Log-Gaussian Cox-processes
  • Model-based geostatistics (*)
  • Spatio-temporal models
  • Survival analysis
  • +++

Main Characteristics

  1. Latent Gaussian model \(\mathbf{u}|\theta\sim\mathcal{N}(0,\mathbf{Q}^{-1}(\theta))\)
  2. The data are conditionally independent given the latent field
  3. The predictor is linear wrt the elements of \(\mathbf{u}\)1
  4. The dimension of \(\mathbf{u}\) can be big (\(10^3-10^6\))
  5. The dimension of \(\theta\) should be not too big.

In this talk we will focus on the characteristics of \(\mathbf{u}|\theta\)

The big n problem!

For many interesting applications it is necessary to solve large problems.

  • With large problems we think of models where the size \(n\) of the latent field \(\mathbf{u}\) is between 100 to 100, 000. This includes fixed effects, spatial effects and so on.
  • Computations with models of this size are cumbersome!
  • The solution in the INLA world are Gaussian Markov random fields (GMRFs)!

Gaussian Markov random fields

The Gaussian distribution

A Gaussian distribution \(\mathbf{u}\sim\mathcal{N}(0,\Sigma)\) is controlled by:

— The mean vector \(\mu\), which we have set equal to 0. This is the centre of the distribution

— The matrix \(\Sigma\) describes all pairwise covariances \(\Sigma_{ij} = \text{Cov}(u_i, u_j)\)

Formally \[ \pi(\mathbf{u}) = \frac{1}{2\pi|\Sigma|^{1/2}}\exp\left\{-\frac{1}{2}\mathbf{u}^T\Sigma^{-1}\mathbf{u}\right\} \]

The Gaussian distribution

Advantages 😄👍 Disadvantages 😔👎
mostly theoretical mostly computational
Analytically tractable. \(\Sigma\) usually is large and dense
We have a good understanding of its properties. Computations scale as \(\mathcal{O}(n^3)\)
Basically, it’s easy to do the theory Not feasible for large problems

We need to reduce the computational burden !

Sparse matrices

A matrix \(\mathbf{Q}\) is called sparse if most of its elements are zero.

\[ \mathbf{Q} = \begin{bmatrix} 1 & 0 & 0 & 0\\ 0& 2& 0 & 0\\ 0& 0& -1 & 0\\ 0& 1& 0 & 0.4\\ \end{bmatrix} \]

— There exist very efficient numerical algorithms to deal with sparse matrices:

— In order to solve large problems we need to exploit some property to make the computations feasible

Which matrix should be sparse in a Gaussian field?

Two possible options

\[ \mathcal{u}\sim\mathcal{N}(0,\Sigma) \]

  1. Force the covariance matrix \(\Sigma\) to be sparse

  2. Force the precision matrix \(\Sigma^{-1}\) to be sparse

Two possible options

\[ \mathcal{u}\sim\mathcal{N}(0,\Sigma) \]

  1. Force the covariance matrix \(\Sigma\) to be sparse

    • \(\Sigma_{ij} = \text{Cov}(u_i, u_j)\) : Covariance between \(u_i\) and \(u_j\)
    • \(\Sigma_{ij} = 0\) \(\longrightarrow\) \(u_i\) and \(u_j\) are independent
    • A sparse covariance matrix implies that many elements of \(\mathbf{u}\) are mutually independent…..is this desirable?
  2. Force the precision matrix \(\mathbf{Q} = \Sigma^{-1}\) to be sparse

Two possible options

\[ \mathcal{u}\sim\mathcal{N}(0,\Sigma) \]

  1. Force the covariance matrix \(\Sigma\) to be sparse

    • \(\Sigma_{ij} = \text{Cov}(u_i, u_j)\) : Covariance between \(u_i\) and \(u_j\)
    • \(\Sigma_{ij} = 0\) \(\longrightarrow\) \(u_i\) and \(u_j\) are independent
    • A sparse covariance matrix implies that many elements of \(\mathbf{u}\) are mutually independent…..is this desirable?
  2. Force the precision matrix \(\mathbf{Q} = \Sigma^{-1}\) to be sparse

    • What does \(Q_{ij}\) represents?
    • What does a sparse precision matrix implies?

Example: The AR1 proces

Definition

\[ \begin{aligned} \mathbf{i=1}&: u_1 \sim\mathcal{N}(0, \frac{1}{1-\phi^2})\\ \mathbf{i=2,\dots,T}&: u_i = \phi\ u_{i-1} +\epsilon_i,\ \epsilon_i\sim\mathcal{N}(0,1) \end{aligned} \]

  • Very common to model dependence in time
  • The joint distribution of \(u_1,u_2,\dots,u_N\) is Gaussian
  • How do covariance and precision matrices look?

Covariance and Precision Matrix for AR1

Covariance Matrix

\[ \Sigma = \frac{1}{1-\phi^2} \begin{bmatrix} 1& \phi & \phi^2 & \dots& \phi^N \\ \phi & 1& \phi & \dots& \phi^{N-1} \\ \phi^2 & \phi & 1 & \dots& \phi^{N-2} \\ \dots& \dots& \dots& \dots& \dots& \\ \phi^{N} & \phi^{N-1}& \phi^{N-2} & \dots& 1\\ \end{bmatrix} \]

  • This is a dense matrix.

  • All elements of the \(\mathbf{u}\) vector are dependent.

Covariance and Precision Matrix for AR1

Precision Matrix

\[ \mathbf{Q} = \Sigma^{-1} = \begin{bmatrix} 1& -\phi & 0 & 0 &\dots& 0 \\ -\phi & 1 + \phi^2& -\phi & 0 & \dots& 0 \\ 0 & -\phi & 1-\phi^2 &-\phi & \dots& 0 \\ 0 & 0 & -\phi &1-\phi^2 & \dots & \dots \\ \dots& \dots& \dots& \dots& \dots& \dots& \\ 0 &0 & 0 & \dots & -\phi& 1\\ \end{bmatrix} \]

  • This is a tridiagonal matrix, it is sparse

  • The tridiagonal form of \(\mathbf{Q}\) can be exploited for quick calculations.

What is the key property of this example that causes \(\mathbf{Q}\) to be sparse?

Conditional independence

The key lies in the full conditionals

\[ u_t|\mathbf{u}_{-t}\sim\mathcal{N}\left(\frac{\phi}{1-\phi^2}(x_{t-1}+x_{t+1}), \frac{1}{1+\phi^2}\right) \]

  • Each timepoint is only conditionally dependent on the two closest timepoints

  • It is useful to represent the conditional independence structure through a graph

Conditional independence and the precision matrix

Theorem: \(u_i\perp u_j|\mathbf{u}_{-ij}\Longleftrightarrow Q{ij} =0\)

This is the key property! 😃

(Informal) definition of a GMRF

A GMRF is a Gaussian distribution where the non-zero elements of the precision matrix are defined by the graph structure.

In the previous example the precision matrix is tridiagonal since each variable is connected only to its predecessor and successor.

\[ \mathbf{Q} = \Sigma^{-1} = \begin{bmatrix} 1& -\phi & 0 & 0 &\dots& 0 \\ -\phi & 1 + \phi^2& -\phi & 0 & \dots& 0 \\ 0 & -\phi & 1-\phi^2 &-\phi & \dots& 0 \\ 0 & 0 & -\phi &1-\phi^2 & \dots & \dots \\ \dots& \dots& \dots& \dots& \dots& \dots& \\ 0 &0 & 0 & \dots & -\phi& 1\\ \end{bmatrix} \]

GMRF and INLA

  • Every component of the latent Gaussian field \(\mathbf{u}\) in an INLA model is always a GMRF!!
  • We will see how the idea of GMRF expands also for continuously indexed processes (SPDE approach)

What do we need to be able to compute?

  • We need to compute determinants of large, sparse matrices. This is needed for likelihood calculations.

  • We need to compute square roots of large, sparse matrices. This is needed for likelihood calculations and simulations.

Technically:

  • In the end our computations come down to the Cholesky decomposition, \(\mathbf{Q} = \mathbf{LL}^T\)
  • After \(\mathbf{L}\) is available, things are fast
  • The limiting factor is how quickly the Cholesky factorization can be done

What is the gain?

  • Temporal structure uses \(\mathcal{O}(n)\) operations
  • Spatial structure uses \(\mathcal{O}(n^{3/2})\) operations
  • Spatio-temporal structure uses \(\mathcal{O}(n^{2})\) operations

Compare this with \(\mathcal{O}(n^{3})\) operations in the general case.

Summary

Gaussian Markov Random Fields (GMRF):

  • Give faster computations, both in INLA and in MCMC schemes (thanks to sparse precision matrices)
  • Keep the analytical tractability of the Gaussian distribution
  • Allow modelling through conditional distributions
  • Naturally arises from the SPDE approach (that we will talk about …..)

Formal definition of a GMRF

Definition

A random variable \(\mathbf{u}\) is said to be a Gaussian Markov random field (GMRF) with respect to the graph \(\mathcal{G}\), with vertices \(\{1, 2,\dots , n\}\) and edges \(\mathcal{E}\), with mean \(\mu\) and precision matrix \(\mathbf{Q}\) if its probability distribution is given by \[ \pi(\mathbf{u}) = \frac{|\mathbf{Q}|^{1/2}}{(2\pi)^{n/2}}\exp\left\{ -\frac{1}{2}(\mathbf{u}-\mu)^T\mathbf{Q}(\mathbf{u}-\mu)\right\} \] and \(Q_{ij} \neq 0\Longleftrightarrow \{i,j\}\in\mathcal{E}\)

How INLA works

The INLA recipe

  • Numerical integration schemes
  • The GMRF-Approximation
  • The Laplace approximation
  • …many many many “technical” details

Hierarchical GMRF

  • \(\mathbf{y}|\mathbf{u},\theta\), Observation model (likelihood)
  • \(\mathbf{u}|\theta\), Gaussian random field
  • \(\theta\) Hyperparameters

Hierarchical GMRF

  • \(\mathbf{y}|\mathbf{u},\theta\), Observation model (likelihood)
  • \(\mathbf{u}|\theta\), Gaussian random field
  • \(\theta\) Hyperparameters

Extra requirements

  1. The latent field \(\mathbf{u}\) is a GMRF \[ \pi(\mathbf{u}|\theta)\propto\exp\left\{-\frac{1}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u}\right\} \]

    • The precision matrix \(\mathbf{Q}\) is sparse.

Hierarchical GMRF

  • \(\mathbf{y}|\mathbf{u},\theta\), Observation model (likelihood)
  • \(\mathbf{u}|\theta\), Gaussian random field
  • \(\theta\) Hyperparameters

Extra requirements

  1. The latent field \(x\) is a GMRF \(\pi(\mathbf{u}|\theta)\propto\exp\left\{-\frac{1}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u}\right\}\)

  2. We can factorize the likelihood as: \(\pi(\mathbf{y}|\mathbf{u},\theta) = \prod_i\pi(y_i|\eta_i,\theta)\)

    • Data are conditional independent give \(\mathbf{u}\) and \(\theta\)
    • Each data point depends on only 1 element of the latent field: the predictor \(\eta_i\)
    • \(\eta\) is a linear combination of other elements of \(\mathbf{u}\): \(\eta = \mathbf{A}^T\mathbf{u}\)

Hierarchical GMRF

  • \(\mathbf{y}|\mathbf{u},\theta\), Observation model (likelihood)
  • \(\mathbf{u}|\theta\), Gaussian random field
  • \(\theta\) Hyperparameters

Extra requirements

  1. The latent field \(\mathbf{u}\) is a GMRF \(\pi(\mathbf{u}|\theta)\propto\exp\left\{-\frac{1}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u}\right\}\)

  2. We can factorize the likelihood as: \(\pi(\mathbf{y}|\mathbf{u},\theta) = \prod_i\pi(y_i|\eta_i,\theta)\)

  3. The vector of hyperparameters \(\theta\) is low dimensional.

Inference

Main Inferential Goal:

\[\begin{aligned} \overbrace{\pi(\mathbf{u}, {\theta}\mid\mathbf{y})}^{{\text{Posterior}}} &\propto \overbrace{\pi({\theta}) \pi(\mathbf{u}\mid{\theta})}^{{\text{Prior}}} \overbrace{\prod_{i}\pi(y_i \mid \eta_i, {\theta})}^{{\text{Likelihood}}} \end{aligned}\]
  • The posterior distribution is, in general, not available in closed form…
  • ..what to do then?

INLA strategy in short

Assumption

We are mainly interested in posterior margianals \(\pi(u_i|\mathbf{y})\) and \(\pi(\theta_j|\mathbf{y})\)

Strategy

  • We use numerical integration in a smart way
  • We approximate what we do not know analitically exploiting the Gaussian structure of \(\mathbf{u}|\theta\)

Numerical Integration

We want to compute \(\pi(u_i|\mathbf{y})\)

Hard way 👎

\[ \pi(u_i|\mathbf{y}) = \int\pi(\mathbf{u}|\theta)\ d\mathbf{u}_{-i} \]

  • This is a hard integral as \(\mathbf{u}\) can be very high dimentional!!

Numerical Integration

We want to compute \(\pi(u_i|\mathbf{y})\)

Hard way 👎

\[ \pi(u_i|\mathbf{y}) = \int\pi(\mathbf{u}|\mathbf{y})\ d\mathbf{u}_{-i} \]

  • This is a hard integral as \(\mathbf{u}\) can be very high dimentional!!

Easier way 👍

\[ \pi(u_i|\mathbf{y}) = \int\pi(u_i|\theta, \mathbf{y})\pi(\theta|\mathbf{y)}\ d\theta \]

  • \(\theta\) is a smaller dimensional vector, easier to integrate
  • BUT we need to approximate both \(\pi(\theta|\mathbf{y})\) and \(\pi(u_i|\theta, \mathbf{y})\)

Approximating the posterior for the hyperparameters

We want to build an approximation to \(\pi(\theta|\mathbf{y})\)

\[ \pi(\mathbf{\theta} \mid \mathbf{y}) = \frac{ \pi(\mathbf{u}, \mathbf{ \theta }\mid \mathbf{y})} {\pi(\mathbf{u} \mid \mathbf{y}, \mathbf{\theta})} \: \Bigg|_{\text{This is just the Bayes theorem! It is valid for any value of } \mathbf{u}} \]

Approximating the posterior for the hyperparameters

We want to build an approximation to \(\pi(\theta|\mathbf{y})\)

\[\begin{aligned} \pi(\mathbf{\theta} \mid \mathbf{y}) & = \frac{ \pi(\mathbf{u}, \mathbf{ \theta }\mid \mathbf{y})} {\pi(\mathbf{u} \mid \mathbf{y}, \mathbf{\theta})} \\ & \propto \frac{ \overbrace{\pi( \mathbf{y}\mid \mathbf{u},\mathbf{\theta})\ \pi(\mathbf{\theta})}^{\text{Non-Gaussian, KNOWN}} \overbrace{\pi( \mathbf{u}\mid \mathbf{\theta})}^{\text{ Gaussian, KNOWN}} } {\underbrace{\pi(\mathbf{u} \mid \mathbf{y}, \mathbf{\theta})}_{\text{Non-Gaussian,UNKNOWN}}} \end{aligned}\]
  • We approximate \(\pi(\mathbf{u} \mid \mathbf{y},\mathbf{\theta})\) with a Gaussian distribution \(\pi_G(\mathbf{u} \mid \mathbf{y},\mathbf{\theta})\)

The GMRF Approximation

\[ \begin{aligned} \pi(\mathbf{u}\mid {\theta},\mathbf{y}) & \propto \pi(\mathbf{u}\mid {\theta}) \pi(\mathbf{y}|\mathbf{u},\theta) \\ & \propto \exp\left\{-\frac{1}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u} + \sum \log \pi(y_i|\eta_i,\theta)\right\}\\ \end{aligned} \]

The GMRF Approximation

\[ \begin{aligned} \pi(\mathbf{u}\mid {\theta},\mathbf{y}) & \propto \pi(\mathbf{u}\mid {\theta}) \pi(\mathbf{y}|\mathbf{u},\theta) \\ & \propto \exp\left\{-\frac{1}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u} + \sum \log \pi(y_i|\eta_i,\theta)\right\}\\ \end{aligned} \]

\[ \begin{aligned} & \approx \exp\left\{-\frac{1}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u} + \sum (b_i\eta_i - \frac{1}{2}c_i\eta_i^2)\right\} \end{aligned} \]

Recall \[ \begin{aligned} f(x) \approx f(x_0) + f'(x_0)(x-x_0)+ \frac{1}{2} f''(x_0)(x-x_0)^2 = a+ bx - \frac{1}{2}cx^2 \end{aligned} \] with

\[ \begin{aligned} b &=f'(x_0) - f''(x_0)x_0\\ c &= -f''(x_0) \end{aligned} \].

The GMRF Approximation

\[ \begin{aligned} \pi(\mathbf{u}\mid {\theta},\mathbf{y}) & \propto \pi(\mathbf{u}\mid {\theta}) \pi(\mathbf{y}|\mathbf{u},\theta) \\ & \propto \exp\left\{-\frac{1}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u} + \sum \log \pi(y_i|\eta_i,\theta)\right\}\\ & = \exp \left\{ -\frac{1}{2} \mathbf{u}^T\tilde{\mathbf{Q}} \mathbf{u} + \tilde{\mathbf{b}}\mathbf{u}\right\} \end{aligned} \] Which means that we approximate \(\pi(\mathbf{u}\mid {\theta},\mathbf{y})\) with

\[ \mathcal{N}(\tilde{\mathbf{Q}}^{-1}\tilde{\mathbf{b}},\tilde{\mathbf{Q}}^{-1})\ \text{ with }\ \tilde{\mathbf{Q}} = \mathbf{Q} +\begin{bmatrix} \text{diag}(\mathbf{c}) & 0\\ 0& 0 \end{bmatrix}\qquad \tilde{\mathbf{b}} = [\mathbf{b}\ 0]^T \]

The GMFR approximation - One dimensional example

Assume \[ \begin{aligned} y|\lambda \sim\text{Poisson}(\lambda) & \text{ Likelihood}\\ \eta = \log(\lambda) & \text{ Predictor}\\ \mathbf{u} = \eta & \text{ Latent field }\\ x\sim\mathcal{N}(0,1) & \text{ Latent Model} \end{aligned} \] we have that

\[ \begin{aligned} \pi(x|y) & \propto\pi(y|x)\pi(x)\\ & \propto\exp\{ -\frac{1}{2}x^2+ \underbrace{xy-\exp(x)}_{\text{non-gaussian part}} \} \end{aligned} \]

The GMFR approximation

  • 😄 Easy to compute
  • 😄 Exact for Gaussian likelihood
  • 😄 Good when you have little or a lot of data
  • 😄 inherits the same sparseness of \(\mathbf{u}\)
  • 😟 No skewness

The Laplace Approximation

\[ \begin{aligned} \pi(\mathbf{\theta} \mid \mathbf{y}) & \propto \frac{ \pi( \mathbf{y}\mid \mathbf{u},\mathbf{\theta})\ \pi(\mathbf{\theta}) \pi( \mathbf{u}\mid \mathbf{\theta}) } {\pi(\mathbf{u} \mid \mathbf{y}, \mathbf{\theta})}\\ & \approx \frac{ \pi( \mathbf{y}\mid \mathbf{u},\mathbf{\theta})\ \pi(\mathbf{\theta}) \pi( \mathbf{u}\mid \mathbf{\theta}) } {\pi_G(\mathbf{u} \mid \mathbf{y}, \mathbf{\theta})} \: \Bigg|_{\mathbf{u} = \mathbf{u}^\star(\mathbf{\theta})} \end{aligned} \]

If \(\theta\) is low dimension we can actually compute this!

This is the Laplace approximation to \(\pi(\theta|\mathbf{y})\) !

The INLA scheme

  1. Approximate \(\pi(\theta|\mathbf{y})\)
  2. Approximate \(\pi(u_i|\theta,\mathbf{y})\)
  3. Use numerical integration to compute \(\pi(u_i|\mathbf{y})\)

Approximating the posterior for the latent field.

  • Now we have an aproximation for \(\pi(\theta|\mathbf{y})\)

  • We need an approximation for \(\pi(u_i|\theta,\mathbf{y})\)

  • This is more difficult… \(\mathbf{u}\) is large!! Anything we do we have to do it many times!

Can we use the marginal from \(\pi_G(\mathbf{u}|\theta, \mathbf{y})\) ?

…unfortunately it is not accurate enough!

From Rue and Martino 2007

Notes:

  • Results are quite accurate in general
  • There can be errors in the location and/or errors due to the lack of skewness…

INLA and Variational Bayes

  • Variational Bayes techniques are used to improve the mean of the GMRF approximation \(\pi_G(\mathbf{u}|\theta, \mathbf{y})\)
  • From this improved GMRF approximation we can compute the marginals \(\pi_G^*(u_i|\theta,\mathbf{y})\) that can then be used to numerically compute: \[ \tilde{\pi}(u_i|\mathbf{y}) = \sum_k\pi_G^*(u_i|\theta, \mathbf{y})\tilde{\pi}(\theta|\mathbf{y)}\ w_k \]

NOTE:

  • We know the precision matrix \(\mathbf{Q}\) of the GRMF approximation \(\pi_G^*(\mathbf{u}|\theta, \mathbf{y})\)
  • Efficient algoriths to compute the marginal variance are used.

INLA posterior sampling

Although INLA mainly computes posterior marginals

  • \(\pi(\theta|\mathbf{y})\)
  • \(\pi(u_i|\mathbf{y})\)

INLA allowes to sample from the approximate posterior \(\pi(\mathbf{u}|\mathbf{y})\)

This is important is one is interested in the posterior of functions of one or more parameters.

Take home messages!

  • The INLA methodology heavily relies on sparse matrices !
  • GMRF are the key tool in INLA-friendly models
  • INLA uses numerical approximation and numerical integration to provide efficient and precise approximations to posterior marginals

But ultimately….

You do not need to deeply understand all of this if you want to use inlabru to solve your (statistical) problems 😄

Good news!

Recall… we have computationally efficient and flexible methodology…

  • we can fit more (realistically) complex models…
  • e.g. spatial and spatio-temporal models (to geo-referenced data, point pattern data, aerial data, spatio-temporal data…)

Using INLA + SPDE approach we can:

  • fit complex models quickly
  • flexibly fit joint models of more than one response variable
  • model on complex domains, (e.g. the sphere = the earth)
  • observation areas with barriers (islands, archipelagos…)

How to use the INLA approach

The INLA methodology is implemented in two R packages: R-INLA and inlabru.

Why do we (strongly) recommend inlabru:

  • wrapper around R-INLA + extra functionality.

  • much easier to fit spatial models

  • full support for sf and terra libraries

  • makes fitting of tricky models (e.g. multi-likelihood models, spatial point processes) less fiddly than R-INLA

  • takes observation process into account (e.g. for distance sampling, plot sampling); more on this later

  • allows for non-linear predictors defined by general R expressions (this is done by linearizing and running the INLA approximation several times…)